Background

Eels were tracked from the Belgian coast (Nieuwpoort) through the North Sea as they migrate to their spawning grounds in the Atlantic Ocean. See our article for the migration routes. In brief, the routes were modeled through geolocation modelling based on logged data on temperature and depth obtained by data loggers that were externally attached to the eels. For more info on the method, we refer to the aforementioned article. In general, we found that the eels can take 2 routes: a northern route over the UK or a southern route through the English Channel.

Now that we analysed the horizontal movement patterns (i.e. the migration routes), we want to analyse the vertical movement patterns. This can provide information on how eels orient themselves or move in a bio-energetic efficient way. Two of our key questions is if the eels have a circadian pattern in their vertical movement and if they apply selective tidal stream transport (STST). It is known that eels apply a diel vertical migration pattern in the Atlantic Ocean: they dive to depths of ca. 800 m during daytime, but swim at depths of ca. 200 m during night time (see this article). However, it is unknown if eels show this or another pattern on the shallow continental shelf. Considering STST, this mode of transport has been illustrated for eels in estuaries. Since the North Sea and the English Channel have strong prevailing tidal currents, it is plausible that eels apply STST as well and hence behave differently during the ebbing and flooding tide. To gain more insight in the vertical movement pattern, we want to analyse the eel’s swimming depth (depth measurements every 5 minutes) over time in relation to the next relevant explanatory variables, being:

*Water currents are calculated as eastward and northward velocity vectors. Since the eels took 2 routes, I would split up the analysis for eels taking the southern route and eels taking the northern route.

Set time zone

Sys.setenv(TZ='GMT')
Sys.timezone()
## [1] "GMT"

Load packages

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(mgcv)
## Loading required package: nlme
## 
## Attaching package: 'nlme'
## The following object is masked from 'package:dplyr':
## 
##     collapse
## This is mgcv 1.8-40. For overview type 'help("mgcv-package")'.
#library(splines) # for bs function
#library(REdaS)
library(car)
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following object is masked from 'package:purrr':
## 
##     some
#library(sjPlot)
#library(EnvStats)
library(pracma)
## 
## Attaching package: 'pracma'
## The following object is masked from 'package:car':
## 
##     logit
## The following object is masked from 'package:mgcv':
## 
##     magic
## The following object is masked from 'package:purrr':
## 
##     cross
library(nlme) # for gls()
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
library(forecast) # for auto.arima() function
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## 
## Attaching package: 'forecast'
## The following object is masked from 'package:nlme':
## 
##     getResponse
library(tseries) # for Augmented Dickey-Fuller Test (adf.test())
library(visreg)
library(itsadug)
## Loading required package: plotfunctions
## 
## Attaching package: 'plotfunctions'
## The following object is masked from 'package:ggplot2':
## 
##     alpha
## Loaded package itsadug 2.4 (see 'help("itsadug")' ).

Load dataset of 42 eels and set columns

data <- read.csv("./data/interim/data_circadian_tidal_moon_sun_5min.csv")
data$ID <- factor(data$ID)
data$datetime <- ymd_hms(data$datetime)
data$night_day <- factor(data$night_day)
data <- data %>%
            rename(direction_x = U,
                   direction_y = V)

Subset 1 eel for testing

data_1eel <- filter(data, ID == "16031")

ACF plot

Create an autocorrelation function plot to explore the cyclic signals in the data. The horizontal blue lines in the plot indicate the confidence interval in the correlogram. First create plot for total dataset.

forecast::Acf(data_1eel$corrected_depth, type = c("correlation"), lag.max=max(dim(data_1eel)), plot = TRUE)  

Next, create the ACF plot for the first 500 observations.

forecast::Acf(data_1eel$corrected_depth, type = c("correlation"), lag.max=500, plot = TRUE)

In the plot above, we see 2 cycles subsequently returning. The first is probably related to a tidal signal, as it occurs ca. every 12 hours at lag 144: (12 hours x 60 minutes) / 5 minutes = 144

The second is probably related to a circadian rythm with a 24 h pattern at ca. lag 288: (24 hours x 60 minutes) / 5 minutes = 288

To illustrate this, we added a green vertical line at lag 144 and a blue line at lag 288 to the plot.

vals_list <- forecast::Acf(data_1eel$corrected_depth, type = c("correlation"), lag.max=max(dim(data_1eel)), plot = FALSE)
plot(vals_list[[1]], xlim=c(0, 1000))
abline(v = 144, col = "darkgreen")
abline(v = 288, col = "darkblue")

Depth data processing

The depth sensors are not 100% accurate. Hence, it can occur that when an eel is swimming close to the surface, the depth sensor registers a value as if the eel would be above the surface. Also, the depth sensors drift over time. A linear regression between the start of data logging (= when the tag was at zero atmospheric pressure) and the moment the tag surfaced (= when the tag was again at zero atmospheric pressure) was conducted at an earlier stage (to generate the trajectories). Hence the variable name ‘corrected’ depth.

plot(data_1eel$corrected_depth, col = rgb(red = 0.5, green = 0.5, blue = 0.5, alpha = 0.2))

summary(data_1eel$corrected_depth)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -169.184  -32.699  -17.962  -26.144   -4.303    1.078

Add daily max depth to dataset

data_max_depth <- data_1eel %>%
  group_by(ID, Date) %>%
  summarise(max_depth = min(corrected_depth))
## `summarise()` has grouped output by 'ID'. You can override using the `.groups`
## argument.
data_1eel <- left_join(data_1eel, data_max_depth, by = c("ID","Date"))

Add daily bathymetry based on the trajectory files obtained from the geolocation models

# Link trajectory data (source trajectory data script)
tr_data <- read_csv("./data/external/trajectory_data/eel_trajectories.csv")
## Rows: 1962 Columns: 25
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (5): Species, Sp_ID, Date, Max_Depth, Max_Temp
## dbl (18): ID, SD_Depth, Av_Depth, Min_Depth, SD_Temp, Av_Temp, Min_Temp, Tem...
## lgl  (2): Max_PoV_WF, Behav_Class
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Select columns
tr_data <- dplyr::select(tr_data, ID, Date, Max_Depth)
tr_data <- rename(tr_data, 
                  max_depth_bathymetry = Max_Depth)

# Process columns
tr_data$Date <- dmy(tr_data$Date)
tr_data$ID <- factor(tr_data$ID)

# Remove double dates per eel (ID)
#tr_data <- tr_data[!duplicated(tr_data[c('ID','Date')]),]
tr_data <- tr_data %>%     # Add ID number to duplicate dates
  group_by(ID, Date) %>%
  add_tally()

duplicates <- filter(tr_data, n == 2)   # Filter duplicate dates
duplicates <- duplicates %>%             # Add ID number to distinguish between first and second duplicate
  mutate(number_id = row_number())
duplicates <- filter(duplicates, number_id == 2)  # Filter second duplicates

tr_data <- filter(tr_data, n != 2)   # Remove duplicate dates from tracking dataset

# Bind 'duplicates' dataset with second duplicates to tracking dataset
tr_data <- ungroup(tr_data)
tr_data$n <- NULL
duplicates <- ungroup(duplicates)
duplicates$n <- NULL
duplicates$number_id <- NULL

tr_data <- rbind(tr_data, duplicates)

# Select relevant eels
tr_data <- filter(tr_data, ID == "16031" )
tr_data$ID <- factor(tr_data$ID) # rerun 'factor()' so the number of levels is set accurately

data_1eel$Date <- as.Date(data_1eel$Date)
data_1eel <- left_join(data_1eel, tr_data, by = c('ID', 'Date'))

# Remove line with max depth bathymetry > 300 m --> remove location in deep sea
data_1eel$max_depth_bathymetry <- -1*(as.numeric(data_1eel$max_depth_bathymetry))
data_1eel <- filter(data_1eel, max_depth_bathymetry >= -300)

Plot the daily depth range, the max depth and the max depth bathymetry. The latter 2 do not differ a lot because one of the assumptions in the geolocation model is that the eel’s daily depth is the daily max bathymetry.

par(mfrow=c(3,1))
plot(data_1eel$Date, data_1eel$corrected_depth)
plot(data_1eel$Date, data_1eel$max_depth)
plot(data_1eel$Date, data_1eel$max_depth_bathymetry)
Three depth variables over time for 1 eel: corrected swimming depth, daily maximum depth and the daily maximum bathymetry obtained via the geolocation model.

Three depth variables over time for 1 eel: corrected swimming depth, daily maximum depth and the daily maximum bathymetry obtained via the geolocation model.

Calculate the relative depth

Eels are known to reside near the bottom. Therefore, it can be reasonably assumed that the daily max swimming depth corresponds with the max daily bathymetry. Hence, the relative depth is calculated as the swimming depth divided by the max depth of that day.

data_1eel$rel_depth <- data_1eel$corrected_depth / data_1eel$max_depth
plot(data_1eel$Date, data_1eel$rel_depth, col = rgb(red = 0.5, green = 0.5, blue = 0.5, alpha = 0.4))
Plot of the relative depth for 1 eel.

Plot of the relative depth for 1 eel.

Exploratory plots

Vertical depth pattern over time

ggplot(data_1eel, aes(x = datetime,
                                   y = corrected_depth)) +
  geom_line() +
  theme_minimal() +
  ylab("Depth (m)") +
  xlab("Date") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) +
  scale_x_datetime(date_breaks  = "1 day") 
Vertical movement pattern for 1 eel.

Vertical movement pattern for 1 eel.

Plot detail of vertical movement behaviour

In this plot a distinctive behaviour can be observed between day and night; the vertical red line indicates midnight. During the daytime, the eels have a steady depth, while at night they have a higher vertical amplitude in the water column. Also, there are moments where the eel remains near the bottom which I think is attributed to the tides.

data_1eel_10days <- filter(data_1eel,datetime >= "2019-02-01 00:00:00", datetime <= "2019-02-11 00:00:00")

# Create line every 24 hours
gnu <-  seq.POSIXt(from = lubridate::floor_date(data_1eel_10days$datetime[1], "day"), to= data_1eel_10days$datetime[nrow(data_1eel_10days)], by = 86400)
class(lubridate::floor_date(data_1eel_10days$datetime[1], "day"))
## [1] "POSIXct" "POSIXt"
ggplot(data_1eel_10days, aes(x = datetime,
                                       y = corrected_depth)) +
  geom_line() +
  theme_minimal() +
  ylab("Depth (m)") +
  xlab("Date") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) +
  scale_x_datetime(date_breaks  ="1 hour") +
  geom_vline(xintercept=gnu, color = "red", size = 1) 
10-day detail of the vertical movement pattern for 1 eel.

10-day detail of the vertical movement pattern for 1 eel.

Relative depth over time

ggplot(data_1eel, aes(x = datetime,
                                   y = rel_depth)) +
  geom_line() +
  theme_minimal() +
  ylab("Relative depth (m)") +
  xlab("Date") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) +
  scale_x_datetime(date_breaks  = "1 day") 
The relative depth for 1 eel.

The relative depth for 1 eel.

Plot detail of relative depth

ggplot(data_1eel_10days, aes(x = datetime,
                                       y = rel_depth)) +
  geom_line() +
  theme_minimal() +
  ylab("Relative depth (m)") +
  xlab("Date") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) +
  scale_x_datetime(date_breaks  ="1 hour") +
  geom_vline(xintercept=gnu, color = "red", size = 1) 
10-day detail of the relative depth for 1 eel.

10-day detail of the relative depth for 1 eel.

Temperature over time

ggplot(data_1eel, aes(x = datetime,
                                 y = temperature)) +
  geom_line() +
  theme_minimal() +
  ylab("Temperature (°C))") +
  xlab("Date") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) +
  scale_x_datetime(date_breaks  = "1 day") 
Water temperature pattern for 1 eel.

Water temperature pattern for 1 eel.

Plot detail of temperature

ggplot(data_1eel_10days, aes(x = datetime,
                                   y = temperature)) +
  geom_line() +
  theme_minimal() +
  ylab("Temperature (°C))") +
  xlab("Date") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) +
  scale_x_datetime(date_breaks  ="1 hour") +
  geom_vline(xintercept=gnu, color = "red", size = 1) 
10-day detail of the water temperature pattern for 1 eel.

10-day detail of the water temperature pattern for 1 eel.

Plot relative depth over temperature

ggplot(data_1eel, aes(x = temperature,
                                 y = rel_depth)) +
  geom_point(alpha = 0.1) +
  theme_minimal() +
  ylab("Relative depth (m)") +
  xlab("Temperature (°C)") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) 
Relative depth over temperature for 1 eel.

Relative depth over temperature for 1 eel.

Plot detail relative depth over temperature

ggplot(data_1eel_10days, aes(x = temperature,
                                 y = rel_depth)) +
  geom_point() +
  theme_minimal() +
  ylab("Relative depth (m)") +
  xlab("Temperature (°C)") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) 
10-day detail of the relative depth over temperature pattern for 1 eel.

10-day detail of the relative depth over temperature pattern for 1 eel.

Plot bathymetry over time

ggplot(data_1eel, aes(x = datetime,
                                 y = max_depth)) +
  geom_point() +
  theme_minimal() +
  ylab("Depth (m)") +
  xlab("Date") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) +
  scale_x_datetime(date_breaks  = "1 day") 
Daily max depth for 1 eel.

Daily max depth for 1 eel.

Plot vertical depth pattern over bathymetry

ggplot(data_1eel, aes(x = max_depth,
                                 y = corrected_depth)) +
  geom_point() +
  theme_minimal() +
  ylab("Depth (m)") +
  xlab("Date") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) 
Vertical movement pattern over the daily max depth for 1 eel.

Vertical movement pattern over the daily max depth for 1 eel.

Plot relative depth over eastward current velocity vector

ggplot(data_1eel, aes(x = direction_x,
                                       y = rel_depth)) +
  geom_point(alpha = 0.1) +
  theme_minimal() +
  ylab("Relative depth (m)") +
  xlab("Eastward current velocity (m/s)") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14))
## Warning: Removed 4332 rows containing missing values (geom_point).
Relative depth over eastward current velocity for 1 eel.

Relative depth over eastward current velocity for 1 eel.

Plot detail relative depth over eastward current velocity vector

ggplot(data_1eel_10days, aes(x = direction_x,
                                       y = rel_depth)) +
  geom_point() +
  theme_minimal() +
  ylab("Relative depth (m)") +
  xlab("Eastward current velocity (m/s)") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14))
## Warning: Removed 960 rows containing missing values (geom_point).
10-day detail of the relative depth over eastward current velocity for 1 eel.

10-day detail of the relative depth over eastward current velocity for 1 eel.

Plot relative depth over northward current velocity vector

ggplot(data_1eel, aes(x = direction_y,
                                   y = rel_depth)) +
  geom_point(alpha = 0.1) +
  theme_minimal() +
  ylab("Relative depth (m)") +
  xlab("Northward current velocity (m/s)") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) 
## Warning: Removed 4332 rows containing missing values (geom_point).
Relative depth over northward current velocity for 1 eel.

Relative depth over northward current velocity for 1 eel.

Plot detail relative depth over northward current velocity vector

ggplot(data_1eel_10days, aes(x = direction_y,
                                   y = rel_depth)) +
  geom_point() +
  theme_minimal() +
  ylab("Relative depth (m)") +
  xlab("Northward current velocity (m/s)") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) 
## Warning: Removed 960 rows containing missing values (geom_point).
10-day detail of the relative depth over northward current velocity for 1 eel.

10-day detail of the relative depth over northward current velocity for 1 eel.

Plot relative depth pattern over day and night

Note that there is not a big difference because the depth range between day and night is equal, but the pattern is not.

ggplot(data_1eel, aes(x = night_day,
                                   y = rel_depth)) +
  geom_point(alpha = 0.1) +
  theme_minimal() +
  ylab("Relative depth (m)") +
  xlab("Circadian phases") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) 
Relative depth during daytime and at night for 1 eel.

Relative depth during daytime and at night for 1 eel.

Plot relative depth pattern over illuminated moon phase

ggplot(data_1eel, aes(x = 100*moon_fraction,
                                   y = rel_depth)) +
  geom_point(alpha = 0.2) +
  theme_minimal() +
  ylab("Relative depth (m)") +
  xlab("Illuminated moon phase (%)") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) 
Relative depth over illuminated fraction of the moon for 1 eel.

Relative depth over illuminated fraction of the moon for 1 eel.

Plot detail relative depth pattern over illuminated moon phase

ggplot(data_1eel_10days, aes(x = 100*moon_fraction,
                                   y = rel_depth)) +
  geom_point() +
  theme_minimal() +
  ylab("Relative depth (m)") +
  xlab("Illuminated moon phase (%)") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) 
10-day detail of the relative depth over illuminated fraction of the moon for 1 eel.

10-day detail of the relative depth over illuminated fraction of the moon for 1 eel.

Plot relative depth pattern over sun altitude

ggplot(data_1eel, aes(x = sun_altitude,
                                   y = rel_depth)) +
  geom_point(alpha = 0.2) +
  theme_minimal() +
  ylab("Relative depth (m)") +
  xlab("Sun altitude (radians)") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) 
The relative depth over the sun altitude for 1 eel.

The relative depth over the sun altitude for 1 eel.

Plot detail relative depth pattern over sun altitude

ggplot(data_1eel_10days, aes(x = sun_altitude,
                                   y = rel_depth)) +
  geom_point() +
  theme_minimal() +
  ylab("Relative depth (m)") +
  xlab("Sun altitude (radians)") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) 
10-day detail of the relative depth over the sun altitude for 1 eel.

10-day detail of the relative depth over the sun altitude for 1 eel.

Plot relative depth pattern over sun class i.e. sunset or sunrise

#plot(data_1eel$sun_altitude)
data_1eel$sun_class <- NA
for (i in 2:dim(data_1eel)[1]){
  if (data_1eel$sun_altitude[i] > data_1eel$sun_altitude[i-1]){
    data_1eel$sun_class[i] = "rising"
  } else{
    data_1eel$sun_class[i] = "setting"
  }}


data_1eel$sun_class <- factor(data_1eel$sun_class)

Note that there is not a big difference between sunrise and sunset because the depth range between day and night is equal, but the pattern is not. The NA is due to the first record of the dataset, for which a sun class could not be attributed according to the rule in the if-else loop.

ggplot(data_1eel, aes(x = sun_class,
                                   y = rel_depth)) +
  geom_point(alpha = 0.2) +
  theme_minimal() +
  ylab("Relative depth (m)") +
  xlab("Sun phase") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) 
Relative depth over the two different sun classes (sunrise and sunset) for 1 eel.

Relative depth over the two different sun classes (sunrise and sunset) for 1 eel.

Plot relative depth pattern over sun azimuth

ggplot(data_1eel, aes(x = sun_azimuth,
                                   y = rel_depth)) +
  geom_point(alpha = 0.2) +
  theme_minimal() +
  ylab("Relative depth (m)") +
  xlab("Sun azimuth (radians)") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) 
Relative depth over the sun azimuth for 1 eel.

Relative depth over the sun azimuth for 1 eel.

Plot detail relative depth pattern over sun azimuth

ggplot(data_1eel_10days, aes(x = sun_azimuth,
                                   y = rel_depth)) +
  geom_point() +
  theme_minimal() +
  ylab("Relative depth (m)") +
  xlab("Sun azimuth (radians)") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) 
10-day detail of the relative depth over the sun azimuth for 1 eel.

10-day detail of the relative depth over the sun azimuth for 1 eel.

Scale variables

For model construction, the exploratory variables need to be scaled. Max value should be between 0.5 - 5.

# direction_x
summary(data_1eel$direction_x)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  -1.756  -0.284   0.002   0.011   0.312   1.579    4332
#iqr <- 0.312  - (-0.284 )   #3rd QU - 1st Qu
#data_1eel$direction_x_centre <- (data_1eel$direction_x - median(data_1eel$direction_x))/iqr

# direction_y
summary(data_1eel$direction_y)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  -0.771  -0.172   0.007   0.015   0.199   1.017    4332
#iqr <- 0.199 - (-0.172)   #3rd QU - 1st Qu
#data_1eel$direction_y_centre <- (data_1eel$direction_y - median(data_1eel$direction_y))/iqr

# Temperature
summary(data_1eel$temperature)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   6.344   9.281   9.938   9.809  10.688  12.375
#iqr <- 10.688  - (9.266 )   #3rd QU - 1st Qu
#data_1eel$temperature_centre <- (data_1eel$temperature - median(data_1eel$temperature))/iqr
data_1eel$temperature_scaled <- data_1eel$temperature / 10

# Moon illumination
summary(data_1eel$moon_fraction)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0001508 0.1299341 0.4100736 0.4585004 0.7963852 0.9999797
#iqr <- 0.7966425   - (0.1311380)   #3rd QU - 1st Qu
#data_1eel$moon_fraction_centre <- (data_1eel$moon_fraction - median(data_1eel$moon_fraction))/iqr

# Sun altitude
summary(data_1eel$sun_altitude)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -1.1107 -0.7703 -0.2729 -0.3103  0.1559  0.5006
#iqr <- 0.1584   - (-0.7657)   #3rd QU - 1st Qu
#data_1eel$sun_altitude_centre <- (data_1eel$sun_altitude - median(data_1eel$sun_altitude))/iqr

# Sun azimuth
summary(data_1eel$sun_azimuth)
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -3.1400445 -1.3306698 -0.0000697  0.0006432  1.3316443  3.1411577
#iqr <- 1.322311    - (-1.326227)   #3rd QU - 1st Qu
#data_1eel$sun_azimuth_centre <- (data_1eel$sun_azimuth - median(subset$sun_azimuth))/iqr

Detrend the response variable

In case it would be needed, the response variable is detrended by subtracting a value by the previous value.

data_1eel <- data_1eel %>% 
  group_by(ID) %>%
  arrange(datetime) %>%
  mutate(diff_rel_depth = rel_depth - lag(rel_depth))
par(mfrow=c(2,1))
plot(data_1eel$datetime, data_1eel$rel_depth)
plot(data_1eel$datetime, data_1eel$diff_rel_depth)

Check ACF & PACF plot if data is stationary

forecast::Acf(data_1eel$diff_rel_depth, type = c("correlation"), lag.max=20, plot = TRUE)

forecast::Pacf(data_1eel$diff_rel_depth, lag.max=20, plot = TRUE)

Conduct model on a subset of 5 eels

Next, a model was applied to a subset of 5 eels with a similar migration path (through the Channel). I want to apply this complex model first on only a few fish before scaling up the complexity.

Load packages

library(lme4)
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:pracma':
## 
##     expm, lu, tril, triu
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## 
## Attaching package: 'lme4'
## The following object is masked from 'package:nlme':
## 
##     lmList
library(glmmTMB)

Subset 5 eels for testing

data_5eels <- filter(data, ID == "16031" |
                   ID == "17535" |
                   ID == "17536" |
                   ID == "15777" |
                   ID == "17510")

data_5eels$ID <- factor(data_5eels$ID)
unique(data_5eels$ID)
## [1] 16031 15777 17510 17536 17535
## Levels: 15777 16031 17510 17535 17536

Remove data from eel 17535 when it was in the Atlantic Ocean

For this study I am interested in the vertical movement behaviour on the contintental shelf. The vertical behaviour in oceanic conditions has already been described.

data_5eels <- data_5eels[!(data_5eels$ID == "17535" & data_5eels$datetime > '2020-01-11 00:00:00'),]

Depth data processing

plot(data_5eels$corrected_depth)

Add daily max depth

max_depth <- data_5eels %>%
  group_by(ID, Date) %>%
  summarise(max_depth = min(corrected_depth))
## `summarise()` has grouped output by 'ID'. You can override using the `.groups`
## argument.
data_5eels <- left_join(data_5eels, max_depth, by = c("ID","Date"))

Add daily bathymetry based on the trajectory files obtained from the geolocation models

Note that I will not use this, but leave the code as is just in case.

# Link trajectory data (source trajectory data script)
tr_data <- read_csv("./data/external/trajectory_data/eel_trajectories.csv")
## Rows: 1962 Columns: 25
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (5): Species, Sp_ID, Date, Max_Depth, Max_Temp
## dbl (18): ID, SD_Depth, Av_Depth, Min_Depth, SD_Temp, Av_Temp, Min_Temp, Tem...
## lgl  (2): Max_PoV_WF, Behav_Class
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Select columns
tr_data <- dplyr::select(tr_data, ID, Date, Max_Depth)
tr_data <- rename(tr_data, 
                  max_depth_bathymetry = Max_Depth)

# Process columns
tr_data$Date <- dmy(tr_data$Date)
tr_data$ID <- factor(tr_data$ID)

# Remove double dates per eel (ID)
#tr_data <- tr_data[!duplicated(tr_data[c('ID','Date')]),]
tr_data <- tr_data %>%     # Add ID number to duplicate dates
  group_by(ID, Date) %>%
  add_tally()

duplicates <- filter(tr_data, n == 2)   # Filter duplicate dates
duplicates <- duplicates %>%             # Add ID number to distinguish between first and second duplicate
  mutate(number_id = row_number())
duplicates <- filter(duplicates, number_id == 2)  # Filter second duplicates

tr_data <- filter(tr_data, n != 2)   # Remove duplicate dates from tracking dataset

# Bind 'duplicates' dataset with second duplicates to tracking dataset
tr_data <- ungroup(tr_data)
tr_data$n <- NULL
duplicates <- ungroup(duplicates)
duplicates$n <- NULL
duplicates$number_id <- NULL

tr_data <- rbind(tr_data, duplicates)

# Select relevant eels
tr_data <- filter(tr_data, ID == "16031" |
                   ID == "17535" |
                   ID == "17536" |
                   ID == "15777" |
                   ID == "17510")

tr_data$ID <- factor(tr_data$ID) # rerun 'factor()' so the number of levels is set accurately

data_5eels$Date <- as.Date(data_5eels$Date)
data_5eels <- left_join(data_5eels, tr_data, by = c('ID', 'Date'))

# Remove line with max depth bathymetry > 300 m --> remove location in deep sea
#data_5eels$max_depth_bathymetry <- as.numeric(data_5eels$max_depth_bathymetry)
#data_5eels <- filter(data_5eels, max_depth_bathymetry >= -300)

Plot the daily depth range, the max depth and the max depth bathymetry. The latter 2 do not differ a lot because one of the assumptions in the geolocation model is that the eel’s daily depth is the daily max bathymetry.

par(mfrow=c(3,1))
plot(data_5eels$Date, data_5eels$corrected_depth)
plot(data_5eels$Date, data_5eels$max_depth)
plot(data_5eels$Date, data_5eels$max_depth_bathymetry)

Calculate the relative depth

data_5eels$rel_depth <- data_5eels$corrected_depth / data_5eels$max_depth
plot(data_5eels$Date, data_5eels$rel_depth)

Create factor indicating if the sun is rising or setting

# Remove rows with NA values
data_5eels <- na.omit(data_5eels)

# Classify sun_altitude as rising or setting sun
data_5eels$sun_class <- NA
  for (i in 2:dim(data_5eels)[1]){
    if (data_5eels$sun_altitude[i] > data_5eels$sun_altitude[i-1]){
      data_5eels$sun_class[i] = "rising"
    } else{
      data_5eels$sun_class[i] = "setting"
    }}

# Set as factor
data_5eels$sun_class <- factor(data_5eels$sun_class)

# Remove first line of each eel, since that "sun_class" value is incorrect. The for-loop does not take into account different eels, but runs over the complete dataset.
data_5eels <- data_5eels %>%
  group_by(ID) %>%
  slice(-1)

Scale variables

For model construction, the exploratory variables need to be scaled. Max value should be between 0.5 - 5.

# direction_x
summary(data_5eels$direction_x)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -1.755500 -0.264700 -0.003000  0.007684  0.280800  1.614200
# direction_y
summary(data_5eels$direction_y)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -1.349200 -0.183000 -0.001000  0.009851  0.203600  1.720400
# Temperature
summary(data_5eels$temperature)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   6.391   9.547  10.703  10.548  11.688  13.203
data_5eels$temperature_scaled <- data_5eels$temperature / 10

# Moon illumination
summary(data_5eels$moon_fraction)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0000752 0.1598697 0.4552128 0.4806714 0.8092580 0.9999797
# Sun altitude
summary(data_5eels$sun_altitude)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -1.1415 -0.7938 -0.2910 -0.3353  0.1378  0.5006
# Sun azimuth
summary(data_5eels$sun_azimuth)
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -3.1415520 -1.3090434  0.0000505  0.0138621  1.3106045  3.1411577

Construct model with AR1 covariance structure

data_5eels <- data_5eels %>%
  arrange(ID, datetime) %>%
  group_by(ID) %>%
  mutate(times = factor(row_number()))

glmer from lme4 package

When running the glmer() function, it gives an error. Not sure why, but based on fora searches it might have to do with the data complexity and/or size.

#model_glmer <- glmer(rel_depth ~ direction_x_centre +
#                   direction_y_centre +
#                   moon_fraction_centre +
#                   sun_altitude_centre +
#                   sun_azimuth_centre +
#                 (1|ID),
#                 data = data_5eels,
#                 family = beta_family(),
#                 na.action = na.omit)
#summary(model_glmer)

Returns the following error:

#  Error in (function (fr, X, reTrms, family, nAGQ = 1L, verbose = 0L, maxit = 100L, : 
#(maxstephalfit) PIRLS step-halvings failed to reduce deviance in pwrssUpdate

glmmTMB from the glmmTMB package

The function glmmTMB() runs when no AR1 construct is implemented. For more info on the method implementation, see this website. The glmmTMB R-package was used.

Update: I included the positive corrected_depth values, hence, the rel_depth now contains negative values so a model with beta-distribution is unable to run.

summary(data_5eels$rel_depth)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.34012  0.09211  0.52734  0.48313  0.83675  1.00000
#model_tmb <- glmmTMB(rel_depth ~ direction_x +
#                   direction_y +
#                   moon_fraction +
#                   sun_altitude +
#                   sun_azimuth,
#                 data = data_5eels,
#                 family = beta_family(),
#                 na.action = na.omit)
#summary(model_tmb)

When the AR1 term is added, a memory error occurs. This is probably due to the fact that this function is rather viable for datasets up to 100k (see documentation)

#model_tmb_ar1 <- glmmTMB(rel_depth ~ direction_x_centre +
#                   direction_y_centre +
#                   moon_fraction_centre +
#                   sun_altitude_centre +
#                   sun_azimuth_centre +
#                 ar1(times + 0 | ID),
#                 family = beta_family(),
#                 data = subset)

Returns the following error:

#Error: cannot allocate vector of size 4.8 Gb

bam from the mgcv package

The function bam from the mgcv package is designed for large datasets (see documentation).

To implement the AR1 strucgture in the model, select the first record per eel and flag as true.

# Select first record per eel and flag as 'TRUE'
t.first <- data_5eels[match(unique(data_5eels$ID), data_5eels$ID),]
t.first <- select(t.first, ID, datetime)
t.first$start.event <- TRUE

# Merge with dataset
data_5eels <- left_join(data_5eels, t.first, by = c("ID", "datetime"))

# Set NA as FALSE
data_5eels[c("start.event")][is.na(data_5eels[c("start.event")])] <- FALSE
# Run gamm model with autocorrelation argument

# Full model
bam_model <- bam(rel_depth ~ s(direction_x) +
                   s(direction_y) +
                   s(moon_fraction) +
                   s(sun_altitude) +
                   s(sun_azimuth) +
                   sun_class +
                   s(ID, bs="re"),
                 family = gaussian(), data = data_5eels, discrete = TRUE,
                 #AR.start=subset$start.event, 
                 rho=0.90,
                 na.action = na.omit)

Calculate rho (= acf value) which can be fed into the bam_model for model optimisation. However, this results in an error for a model of the beta family.

# Calculate Rho, the autocorrelation value of a specific lag
#valRho <- acf(resid(bam_model), plot=TRUE)$acf[2]
#Error in plot.window(...) : need finite 'ylim' values

Show summary of the model

summary(bam_model)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## rel_depth ~ s(direction_x) + s(direction_y) + s(moon_fraction) + 
##     s(sun_altitude) + s(sun_azimuth) + sun_class + s(ID, bs = "re")
## 
## Parametric coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      0.477439   0.037303   12.80   <2e-16 ***
## sun_classsetting 0.002718   0.007542    0.36    0.719    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                    edf Ref.df      F  p-value    
## s(direction_x)   4.361  5.591  1.809    0.098 .  
## s(direction_y)   1.001  1.001  0.651    0.420    
## s(moon_fraction) 4.236  5.194  1.659    0.137    
## s(sun_altitude)  1.001  1.001  0.325    0.569    
## s(sun_azimuth)   4.670  5.855  5.226 3.15e-05 ***
## s(ID)            3.814  5.000 16.758  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.0499   Deviance explained = 5.04%
## fREML = -18593  Scale est. = 0.13465   n = 45020

Check the model

par(mfrow = c(2, 2))
gam.check(bam_model)

## 
## Method: fREML   Optimizer: perf chol
## $grad
## [1]  1.562343e-06 -3.118314e-04  1.348723e-06 -2.585771e-04  5.227809e-05
## [6]  3.020207e-07  1.004495e-04
## 
## $hess
##            [,1]          [,2]          [,3]          [,4]          [,5]
##    8.215911e-01 -9.877175e-06  1.445064e-02 -1.563327e-06  4.341655e-03
##   -9.877175e-06  3.117270e-04  5.399458e-06 -2.759905e-10  2.845264e-06
##    1.445064e-02  5.399458e-06  5.593165e-01 -1.391139e-06  6.930835e-03
##   -1.563327e-06 -2.759905e-10 -1.391139e-06  2.584836e-04 -5.221674e-05
##    4.341655e-03  2.845264e-06  6.930835e-03 -5.221674e-05  1.165441e+00
##    7.553825e-03 -2.268884e-07  1.127069e-02 -3.021913e-07 -5.409676e-03
## d -1.680286e+00 -6.103953e-05 -1.618025e+00 -1.002699e-04 -1.834949e+00
##            [,6]          [,7]
##    7.553825e-03 -1.680286e+00
##   -2.268884e-07 -6.103953e-05
##    1.127069e-02 -1.618025e+00
##   -3.021913e-07 -1.002699e-04
##   -5.409676e-03 -1.834949e+00
##    1.818722e+00 -1.906891e+00
## d -1.906891e+00  2.250650e+04
## 
## Model rank =  52 / 52 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                    k'  edf k-index p-value    
## s(direction_x)   9.00 4.36    0.68  <2e-16 ***
## s(direction_y)   9.00 1.00    0.69  <2e-16 ***
## s(moon_fraction) 9.00 4.24    0.95  <2e-16 ***
## s(sun_altitude)  9.00 1.00    1.00    0.68    
## s(sun_azimuth)   9.00 4.67    1.01    0.71    
## s(ID)            5.00 3.81      NA      NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

GAM on data of 1 eel

This document describes the GAM procedure following Zuur et al. (2012) ‘Beginner’s Guide to Generalized Additive Models with R’. The analysis is applied on a single eel (ID A16031) and a few days for simplicity. Spoiler alert: it shows that the model runs very good on a few days but is off when the amount of data (i.e. days of data) is increased.

Subset on eel A16031

subset_1eel <- filter(data_5eels, ID == "16031")

Plot the depth data. Note that the depth sensor data drifts. A linear regression was applied to correct for this drift, hence the variable name ‘corrected_depth’. Also, the relative depth might be a better response variable and was calculated as the corrected_depth divided by the max depth for that day. It is assumed that the max depth resembles the sea bottom.

ggplot(subset_1eel, aes(x = datetime, y = corrected_depth)) +
  geom_line() +
  theme_minimal() +
  ylab("Depth (m)") +
  xlab("Date") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) +
  scale_x_datetime(date_breaks  = "5 days") 

ggplot(subset_1eel, aes(x = datetime, y = rel_depth)) +
  geom_line() +
  theme_minimal() +
  ylab("Depth (m)") +
  xlab("Date") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) +
  scale_x_datetime(date_breaks  = "5 days") 

Subset two days of data in the beginning of the track, around half December

subset_1eel_2days_1 <- filter(subset_1eel, datetime > '2018-12-12 00:00:00',
                              datetime < '2018-12-14 00:00:00')

ggplot(subset_1eel_2days_1, aes(x = datetime, y = corrected_depth)) +
  geom_line() +
  theme_minimal() +
  ylab("Depth (m)") +
  xlab("Date") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) +
  scale_x_datetime(date_breaks  = "1 day") 

ggplot(subset_1eel_2days_1, aes(x = datetime, y = rel_depth)) +
  geom_line() +
  theme_minimal() +
  ylab("Depth (m)") +
  xlab("Date") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) +
  scale_x_datetime(date_breaks  = "1 day") 

Apply the GAM. One with a Gaussian distribution and one with a binomial distribution (since the data is between 0 and 1). The explanatory variables I am interested in are the current variables (obtained via CEFAS model) and the factor with circadian phases ‘day’ or ‘night’. The current variables are split in two vectors giving the current speed in direction ‘x’ and ‘y’. Before the variables are fed into the GAM, a correlation analysis is applied between direction_x and direction_y. When a strong correlation (> |0.6|) is observed, variable ‘direction_x’ is chosen.

Despite the values of the response variable are between 0 and 1, indicating a binomial distribution, the GAM with a Gaussian distribution results in the best fit and lowest AIC. Further, note that the edf in the model output illustrate non-linear relationships.

cor(subset_1eel_2days_1$direction_x, subset_1eel_2days_1$direction_y)
## [1] 0.9327713
gam1 <- gam(rel_depth ~ s(direction_x, by = night_day) + night_day, data = subset_1eel_2days_1)
gam2 <- gam(rel_depth ~ s(direction_x, by = night_day) + night_day, data = subset_1eel_2days_1, family = binomial(link = "logit"))
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
AIC(gam1, gam2)
##             df       AIC
## gam1 13.294183  41.90012
## gam2  7.653967 492.52814
summary(gam1)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## rel_depth ~ s(direction_x, by = night_day) + night_day
## 
## Parametric coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     0.70278    0.02116   33.21   <2e-16 ***
## night_daynight -0.39104    0.02499  -15.65   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                                 edf Ref.df      F p-value    
## s(direction_x):night_dayday   2.030  2.541  6.178 0.00111 ** 
## s(direction_x):night_daynight 8.264  8.848 34.471 < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.558   Deviance explained = 56.7%
## GCV = 0.062786  Scale est. = 0.061444  n = 575

In a next step, the smoothers are plotted for the best model (gam1)

par(mfrow = c(2,1))
plot(gam1, select = 1, main = "Day")
plot(gam1, select = 2, main = "Night")

Check model performance.

par(mfrow = c(2, 2))
gam.check(gam1)

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 11 iterations.
## The RMS GCV score gradient at convergence was 5.677614e-07 .
## The Hessian was positive definite.
## Model rank =  20 / 20 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                                 k'  edf k-index p-value    
## s(direction_x):night_dayday   9.00 2.03    0.29  <2e-16 ***
## s(direction_x):night_daynight 9.00 8.26    0.29  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
subset_1eel_2days_1$residuals <- residuals(gam1, type="response")
subset_1eel_2days_1$predicted <- predict.bam(gam1)
subset_1eel_2days_1$fitted <- fitted(gam1, type="response")


plot <- ggplot(subset_1eel_2days_1, aes(x=numericdate, y=rel_depth)) +
  geom_line(alpha = 0.2) +
  geom_line(subset_1eel_2days_1, mapping = aes(x=numericdate, y=fitted), colour = "red") 
plot

E1 <- resid(gam1, type = "pearson")
F1 <- fitted(gam1, type = "response")
par(mfrow = c(2,2))
plot(x = F1, y = E1, xlab = "Fitted values", ylab = "Pearson residuals")
plot(x = subset_1eel_2days_1$direction_x, y = E1, xlab = "Direction x", ylab = "Pearson residuals")
abline(h = 0, lty = 2)
plot(E1 ~ night_day, data = subset_1eel_2days_1, ylab = "Pearson residuals", xlab = "Day Night")
hist(E1, xlab = "Pearson residuals", main = "")

Check overdispersion

E1 <- resid(gam1, type = "pearson")
N <- nrow(subset_1eel_2days_1)
p <- length(coef(gam1))
overdispersion <- sum(E1^2)/(N-p)
overdispersion
## [1] 0.06229675

Subset two days of data toward the end of the track when the vertical movement becomes more pronounced.

subset_1eel_2days_2 <- filter(subset_1eel, datetime > '2019-02-04 00:00:00',
                              datetime < '2019-02-06 00:00:00')

ggplot(subset_1eel_2days_2, aes(x = datetime, y = corrected_depth)) +
  geom_line() +
  theme_minimal() +
  ylab("Depth (m)") +
  xlab("Date") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) +
  scale_x_datetime(date_breaks  = "1 day") 

ggplot(subset_1eel_2days_2, aes(x = datetime, y = rel_depth)) +
  geom_line() +
  theme_minimal() +
  ylab("Depth (m)") +
  xlab("Date") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) +
  scale_x_datetime(date_breaks  = "1 day") 

Apply GAM

cor(subset_1eel_2days_2$direction_x, subset_1eel_2days_2$direction_y)
## [1] 0.8510163
gam1 <- gam(rel_depth ~ s(direction_x, by = night_day) + night_day, data = subset_1eel_2days_2)
gam2 <- gam(rel_depth ~ s(direction_x, by = night_day) + night_day, data = subset_1eel_2days_2, family = binomial(link = "logit"))
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
AIC(gam1, gam2)
##            df       AIC
## gam1 18.64829 -135.9035
## gam2 16.09634  344.0297
summary(gam1)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## rel_depth ~ s(direction_x, by = night_day) + night_day
## 
## Parametric coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     0.49473    0.01710  28.939  < 2e-16 ***
## night_daynight  0.08625    0.02208   3.907 0.000112 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                                 edf Ref.df     F p-value    
## s(direction_x):night_dayday   6.798  7.703 25.53  <2e-16 ***
## s(direction_x):night_daynight 8.850  8.993 41.70  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.612   Deviance explained = 62.8%
## GCV = 0.040936  Scale est. = 0.03905   n = 383

In a next step, the smoothers are plotted for the best model (gam1)

par(mfrow = c(2,1))
plot(gam1, select = 1, main = "Day")
plot(gam1, select = 2, main = "Night")

Check model performance.

par(mfrow = c(2, 2))
gam.check(gam1)

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 11 iterations.
## The RMS GCV score gradient at convergence was 7.60565e-07 .
## The Hessian was positive definite.
## Model rank =  20 / 20 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                                 k'  edf k-index p-value    
## s(direction_x):night_dayday   9.00 6.80    0.33  <2e-16 ***
## s(direction_x):night_daynight 9.00 8.85    0.33  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
subset_1eel_2days_2$residuals <- residuals(gam1, type="response")
subset_1eel_2days_2$predicted <- predict.bam(gam1)
subset_1eel_2days_2$fitted <- fitted(gam1, type="response")


plot <- ggplot(subset_1eel_2days_2, aes(x=numericdate, y=rel_depth)) +
  geom_line(alpha = 0.2) +
  geom_line(subset_1eel_2days_2, mapping = aes(x=numericdate, y=fitted), colour = "red") 
plot

E1 <- resid(gam1, type = "pearson")
F1 <- fitted(gam1, type = "response")
par(mfrow = c(2,2))
plot(x = F1, y = E1, xlab = "Fitted values", ylab = "Pearson residuals")
plot(x = subset_1eel_2days_2$direction_x, y = E1, xlab = "Direction x", ylab = "Pearson residuals")
abline(h = 0, lty = 2)
plot(E1 ~ night_day, data = subset_1eel_2days_2, ylab = "Pearson residuals", xlab = "Day Night")
hist(E1, xlab = "Pearson residuals", main = "")

Check overdispersion

E1 <- resid(gam1, type = "pearson")
N <- nrow(subset_1eel_2days_2)
p <- length(coef(gam1))
overdispersion <- sum(E1^2)/(N-p)
overdispersion
## [1] 0.03930261

Subset six days of data toward the end of the track when the vertical movement becomes more pronounced.

subset_1eel_2days_6 <- filter(subset_1eel, datetime > '2019-02-04 00:00:00',
                              datetime < '2019-02-10 00:00:00')

ggplot(subset_1eel_2days_6, aes(x = datetime, y = corrected_depth)) +
  geom_line() +
  theme_minimal() +
  ylab("Depth (m)") +
  xlab("Date") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) +
  scale_x_datetime(date_breaks  = "1 day") 

ggplot(subset_1eel_2days_6, aes(x = datetime, y = rel_depth)) +
  geom_line() +
  theme_minimal() +
  ylab("Depth (m)") +
  xlab("Date") +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14)) +
  scale_x_datetime(date_breaks  = "1 day") 

Apply GAM

Note that the R² adj reduced by more than half (33%) by addding 4 more days of data.

cor(subset_1eel_2days_6$direction_x, subset_1eel_2days_6$direction_y)
## [1] 0.6764244
gam1 <- gam(rel_depth ~ s(direction_x, by = night_day) + night_day, data = subset_1eel_2days_6)
gam2 <- gam(rel_depth ~ s(direction_x, by = night_day) + night_day, data = subset_1eel_2days_6, family = binomial(link = "logit"))
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
AIC(gam1, gam2)
##            df       AIC
## gam1 17.37228  108.7839
## gam2 11.11351 1315.2891
summary(gam1)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## rel_depth ~ s(direction_x, by = night_day) + night_day
## 
## Parametric coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     0.51872    0.01202  43.142  < 2e-16 ***
## night_daynight  0.09847    0.01560   6.311 3.96e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                                 edf Ref.df     F p-value    
## s(direction_x):night_dayday   7.922  8.677 19.26  <2e-16 ***
## s(direction_x):night_daynight 6.450  7.535 43.38  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.318   Deviance explained = 32.7%
## GCV = 0.064255  Scale est. = 0.063341  n = 1151

In a next step, the smoothers are plotted for the best model (gam1)

par(mfrow = c(2,1))
plot(gam1, select = 1, main = "Day")
plot(gam1, select = 2, main = "Night")

Check model performance. There seems to be a trend in de residuals.

par(mfrow = c(2, 2))
gam.check(gam1)

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 7 iterations.
## The RMS GCV score gradient at convergence was 2.439333e-07 .
## The Hessian was positive definite.
## Model rank =  20 / 20 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                                 k'  edf k-index p-value    
## s(direction_x):night_dayday   9.00 7.92    0.17  <2e-16 ***
## s(direction_x):night_daynight 9.00 6.45    0.17  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The model fit is reduced substantially.

subset_1eel_2days_6$residuals <- residuals(gam1, type="response")
subset_1eel_2days_6$predicted <- predict.bam(gam1)
subset_1eel_2days_6$fitted <- fitted(gam1, type="response")


plot <- ggplot(subset_1eel_2days_6, aes(x=numericdate, y=rel_depth)) +
  geom_line(alpha = 0.2) +
  geom_line(subset_1eel_2days_6, mapping = aes(x=numericdate, y=fitted), colour = "red") 
plot

E1 <- resid(gam1, type = "pearson")
F1 <- fitted(gam1, type = "response")
par(mfrow = c(2,2))
plot(x = F1, y = E1, xlab = "Fitted values", ylab = "Pearson residuals")
plot(x = subset_1eel_2days_6$direction_x, y = E1, xlab = "Direction x", ylab = "Pearson residuals")
abline(h = 0, lty = 2)
plot(E1 ~ night_day, data = subset_1eel_2days_6, ylab = "Pearson residuals", xlab = "Day Night")
hist(E1, xlab = "Pearson residuals", main = "")

Check overdispersion

E1 <- resid(gam1, type = "pearson")
N <- nrow(subset_1eel_2days_2)
p <- length(coef(gam1))
overdispersion <- sum(E1^2)/(N-p)
overdispersion
## [1] 0.1979843